Checking rgeos availability: FALSE
Please note that 'maptools' will be retired during 2023,
plan transition at your earliest convenience;
some functionality will be moved to 'sp'.
Note: when rgeos is not available, polygon geometry computations in maptools depend on gpclib,
which has a restricted licence. It is disabled by default;
to enable gpclib, type gpclibPermit()
Attaching package: 'maptools'
The following object is masked from 'package:Hmisc':
label
Loading required package: robustbase
Attaching package: 'robustbase'
The following object is masked from 'package:survival':
heart
Loading required package: Rcpp
Loading required package: spatialreg
Loading required package: spData
To access larger datasets in this package, install the spDataLarge
package with: `install.packages('spDataLarge',
repos='https://nowosad.github.io/drat/', type='source')`
Loading required package: Matrix
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
Check the status field. True refers to all the functional, functional not in use water points, whereas False refers to the non-functional water points. Those with unknown status have been removed.
osun_wp_sf %>%freq(input ="status")
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.
ℹ The deprecated feature was likely used in the funModeling package.
Please report the issue at <https://github.com/pablo14/funModeling/issues>.
status frequency percentage cumulative_perc
1 TRUE 2642 55.5 55.5
2 FALSE 2118 44.5 100.0
Warning: Couldn't find skimmers for class: sfc_POINT, sfc; No user-defined `sfl`
provided. Falling back to `character`.
Data summary
Name
Piped data
Number of rows
4760
Number of columns
75
_______________________
Column type frequency:
character
47
logical
5
numeric
23
________________________
Group variables
None
Variable type: character
skim_variable
n_missing
complete_rate
min
max
empty
n_unique
whitespace
source
0
1.00
5
44
0
2
0
report_date
0
1.00
22
22
0
42
0
status_id
0
1.00
2
7
0
3
0
water_source_clean
0
1.00
8
22
0
3
0
water_source_category
0
1.00
4
6
0
2
0
water_tech_clean
24
0.99
9
23
0
3
0
water_tech_category
24
0.99
9
15
0
2
0
facility_type
0
1.00
8
8
0
1
0
clean_country_name
0
1.00
7
7
0
1
0
clean_adm1
0
1.00
3
5
0
5
0
clean_adm2
0
1.00
3
14
0
35
0
clean_adm3
4760
0.00
NA
NA
0
0
0
clean_adm4
4760
0.00
NA
NA
0
0
0
installer
4760
0.00
NA
NA
0
0
0
management_clean
1573
0.67
5
37
0
7
0
status_clean
0
1.00
9
32
0
7
0
pay
0
1.00
2
39
0
7
0
fecal_coliform_presence
4760
0.00
NA
NA
0
0
0
subjective_quality
0
1.00
18
20
0
4
0
activity_id
4757
0.00
36
36
0
3
0
scheme_id
4760
0.00
NA
NA
0
0
0
wpdx_id
0
1.00
12
12
0
4760
0
notes
0
1.00
2
96
0
3502
0
orig_lnk
4757
0.00
84
84
0
1
0
photo_lnk
41
0.99
84
84
0
4719
0
country_id
0
1.00
2
2
0
1
0
data_lnk
0
1.00
79
96
0
2
0
water_point_history
0
1.00
142
834
0
4750
0
clean_country_id
0
1.00
3
3
0
1
0
country_name
0
1.00
7
7
0
1
0
water_source
0
1.00
8
30
0
4
0
water_tech
0
1.00
5
37
0
20
0
adm2
0
1.00
3
14
0
33
0
adm3
4760
0.00
NA
NA
0
0
0
management
1573
0.67
5
47
0
7
0
adm1
0
1.00
4
5
0
4
0
New Georeferenced Column
0
1.00
16
35
0
4760
0
lat_lon_deg
0
1.00
13
32
0
4760
0
public_data_source
0
1.00
84
102
0
2
0
converted
0
1.00
53
53
0
1
0
created_timestamp
0
1.00
22
22
0
2
0
updated_timestamp
0
1.00
22
22
0
2
0
Geometry
0
1.00
33
37
0
4760
0
ADM2_EN
0
1.00
3
14
0
30
0
ADM2_PCODE
0
1.00
8
8
0
30
0
ADM1_EN
0
1.00
4
4
0
1
0
ADM1_PCODE
0
1.00
5
5
0
1
0
Variable type: logical
skim_variable
n_missing
complete_rate
mean
count
rehab_year
4760
0
NaN
:
rehabilitator
4760
0
NaN
:
is_urban
0
1
0.39
FAL: 2884, TRU: 1876
latest_record
0
1
1.00
TRU: 4760
status
0
1
0.56
TRU: 2642, FAL: 2118
Variable type: numeric
skim_variable
n_missing
complete_rate
mean
sd
p0
p25
p50
p75
p100
hist
row_id
0
1.00
68550.48
10216.94
49601.00
66874.75
68244.50
69562.25
471319.00
▇▁▁▁▁
lat_deg
0
1.00
7.68
0.22
7.06
7.51
7.71
7.88
8.06
▁▂▇▇▇
lon_deg
0
1.00
4.54
0.21
4.08
4.36
4.56
4.71
5.06
▃▆▇▇▂
install_year
1144
0.76
2008.63
6.04
1917.00
2006.00
2010.00
2013.00
2015.00
▁▁▁▁▇
fecal_coliform_value
4760
0.00
NaN
NA
NA
NA
NA
NA
NA
distance_to_primary_road
0
1.00
5021.53
5648.34
0.01
719.36
2972.78
7314.73
26909.86
▇▂▁▁▁
distance_to_secondary_road
0
1.00
3750.47
3938.63
0.15
460.90
2554.25
5791.94
19559.48
▇▃▁▁▁
distance_to_tertiary_road
0
1.00
1259.28
1680.04
0.02
121.25
521.77
1834.42
10966.27
▇▂▁▁▁
distance_to_city
0
1.00
16663.99
10960.82
53.05
7930.75
15030.41
24255.75
47934.34
▇▇▆▃▁
distance_to_town
0
1.00
16726.59
12452.65
30.00
6876.92
12204.53
27739.46
44020.64
▇▅▃▃▂
rehab_priority
2654
0.44
489.33
1658.81
0.00
7.00
91.50
376.25
29697.00
▇▁▁▁▁
water_point_population
4
1.00
513.58
1458.92
0.00
14.00
119.00
433.25
29697.00
▇▁▁▁▁
local_population_1km
4
1.00
2727.16
4189.46
0.00
176.00
1032.00
3717.00
36118.00
▇▁▁▁▁
crucialness_score
798
0.83
0.26
0.28
0.00
0.07
0.15
0.35
1.00
▇▃▁▁▁
pressure_score
798
0.83
1.46
4.16
0.00
0.12
0.41
1.24
93.69
▇▁▁▁▁
usage_capacity
0
1.00
560.74
338.46
300.00
300.00
300.00
1000.00
1000.00
▇▁▁▁▅
days_since_report
0
1.00
2692.69
41.92
1483.00
2688.00
2693.00
2700.00
4645.00
▁▇▁▁▁
staleness_score
0
1.00
42.80
0.58
23.13
42.70
42.79
42.86
62.66
▁▁▇▁▁
location_id
0
1.00
235865.49
6657.60
23741.00
230638.75
236199.50
240061.25
267454.00
▁▁▁▁▇
cluster_size
0
1.00
1.05
0.25
1.00
1.00
1.00
1.00
4.00
▇▁▁▁▁
lat_deg_original
4760
0.00
NaN
NA
NA
NA
NA
NA
NA
lon_deg_original
4760
0.00
NaN
NA
NA
NA
NA
NA
NA
count
0
1.00
1.00
0.00
1.00
1.00
1.00
1.00
1.00
▁▁▇▁▁
We will clean up the osun dataset to only include our interested independent variables. We also convert usage_capacity to a factor (categorical variable) as it only has two values/levels - 300 and 1000.
Note that sf dataframe is not suitable for computing correlation analysis as sf dataframe has a geometry column. We can drop the geometry column using st_set_geometry(NULL) or st_drop_geometry().
Note that the bandwidth to use might not be the very last value as the above code chunk will iterate through. To get the bandwidth value with the optimal AICc value, we should call bw.fixed.
To assess the performance of the gwlr, firstly, we will convert the SDF object in as a data frame by using the code chunk below.
gwr.fixed <-as.data.frame(gwlr.fixed$SDF)
Next, we will label yhat values greater than or equal to 0.5 into 1 else 0. The result of the logit comparison operation will be saved into a field called most.
gwr.fixed <- gwr.fixed %>%mutate(most =ifelse( gwr.fixed$yhat >=0.5, T, F ))